Performability Optimization using Linear Bounds of Partially Observable Markov Decision Processes
نویسندگان
چکیده
Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs) have been proposed as a framework for performability management. However, exact solution of even small POMDPs is very difficult because of their potentially infinite induced state spaces. In this paper, we present new lower bounds on the accumulated reward measure for MDPs and POMDPs. We describe how the bounds can be used in conjunction with heuristic search techniques in order to circumvent the state-space explosion problem in POMDPs. Our techniques can be used to choose actions that attempt to maximize performability during system recovery in self-healing systems.
منابع مشابه
Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes
Partially observable Markov decision processes (POMDPs) allow one to model complex dynamic decision or control problems that include both action outcome uncertainty and imperfect observability. The control problem is formulated as a dynamic optimization problem with a value function combining costs or rewards from multiple steps. In this paper we propose, analyse and test various incremental me...
متن کاملApproximate Linear Programming for Constrained Partially Observable Markov Decision Processes
In many situations, it is desirable to optimize a sequence of decisions by maximizing a primary objective while respecting some constraints with respect to secondary objectives. Such problems can be naturally modeled as constrained partially observable Markov decision processes (CPOMDPs) when the environment is partially observable. In this work, we describe a technique based on approximate lin...
متن کاملSolving Partially Observable Markov Decision Processes by Neural Networks
Partially Observable Markov Decision Processes POMDPs cope with sequential decision processes where an agent tries to maximize or minimize some reward without complete knowledge of the process. These models are of interest for quality control, machine maintenance, reinforcement learning, etc. More generally Monahan 99 has shown that many tasks in partially observable environments can be viewed ...
متن کاملProducing efficient error-bounded solutions for transition independent decentralized mdps
There has been substantial progress on algorithms for single-agent sequential decision making using partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds and fast convergence rates. Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading ...
متن کاملGeometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes
It is well known that any finite state Markov decision process (MDP) has a deterministic memoryless policy that maximizes the discounted longterm expected reward. Hence for such MDPs the optimal control problem can be solved over the set of memoryless deterministic policies. In the case of partially observable Markov decision processes (POMDPs), where there is uncertainty about the world state,...
متن کامل